Mean field approximation for solving QUBO problems

The Quadratic Unconstrained Binary Optimization (QUBO) problem is NP-hard. Some exact methods like the Branch-and-Bound algorithm are suitable for small problems. Some approximations like stochastic simulated annealing for discrete variables or mean-field annealing for continuous variables exist for larger ones, and quantum computers based on the quantum adiabatic annealing principle have also been developed. Here we show that the mean-field approximation of the quantum adiabatic annealing leads to equations similar to those of thermal mean-field annealing. However, a new type of sigmoid function replaces the thermal one. The new mean-field quantum adiabatic annealing can replicate the best-known cut values on some of the popular benchmark Maximum Cut problems.


Introduction
From solid state physics [1] to social phenomena [2] Ising models can describe a wide range of complex systems. Spin models are versatile because they are simple yet able to show complex phenomena, like emergence and phase transitions [3][4][5]. Spin models are also important in large, real-life optimization problems which can be distilled down to finding the global minimum of a high-dimensional, nonlinear function. Most of these tasks are NP-hard [6]. Small problems up to approximately a hundred nodes for a dense problem can be solved by exact methods like the Branch-and-Bound (B&B) [7], or Branch-and-Cut [8]. Larger problems can be solved approximately with intuitive methods like the tabu search [9,10], semi-definite programming [11][12][13], stochastic simulated annealing (SSA) [14][15][16] and quantum annealing [17][18][19]. Quantum annealing is also the key idea behind adiabatic quantum computers [20,21] such as the D-Wave system [22][23][24][25]. To reduce the computational costs of stochastic methods, the mean-field annealing (MFA) [26] has been proposed. In MFA, the dynamics of spins are replaced with the evolution of their average values. As the temperature decreases, the MFA algorithm updates these averages based on their values at the previous temperature. Computation using the means reaches equilibrium faster than the corresponding stochastic dynamics, and MFA relaxes to a solution much faster than SSA, leading to an overall decrease in computational effort. Such methods have become popular lately, and special devices like coherent Ising machines [27,28] have been developed.
In this paper, we develop the mean-field version of the quantum annealing [17] method and apply it to the maximum cut problem, which belongs to the larger set of Quadratic Unconstrained Binary Optimization (QUBO) problems. The mean-field quantum annealing introduced here is similar to [18,19] in spirit. The novelty is the proposed quantum MFA is that the equations derived here are fully analogous to the thermal MFA equations, the tunneling rate of the quantum MFA corresponds to the inverse temperature, and the sigmoid theta function is replaced with a new and different sigmoid function. The new sigmoid function improves the performance of the quantum MFA, which is in line with recent result [29] emphasizing the importance of the shape of the sigmoid function.
The structure of this paper is the following. In section 2 we summarize the thermal meanfield annealing method for the Ising problem. Then in section 3 we present the quantum annealing problem and its proposed novel mean-field solution. In section 4 we test the new method on the maximum cut problem benchmark G-sets [30], and finally, in section 5 we compare the new method with SSA and B&B.

Thermal mean-field annealing
The variational principle of statistical physics is a powerful tool for examining interacting systems at finite temperatures. The most straightforward version of this principle is the meanfield approximation. This approximation works best when the number of interactions per spin is large. In real-life optimization problems, the number of nodes is usually not as large as in physics, typically just tens of thousands, and usually, they have no symmetry. The statistical physical derivation of the mean-field annealing starts from the expression of the free energy at finite temperature, T, where the sum goes over all the system states, and E n is the energy. The probability of finding the system at state n is proportional to the Boltzmann weight e À E n =T , therefore, the free energy coincides with the ground state energy at zero temperature. The problem is that even if we know all the E n energies, apart from the most straightforward cases, we cannot evaluate the summation due to the exponentially large number of terms. The variational principle states that the exact free energy is always smaller or equal to variational free energy where P n is an arbitrary probability distribution. The closer the probabilities P n are to the real ones, the lower F V (T) will be. In practice, the distribution can't be too complicated since we have to calculate the variational free energy analytically, avoiding exponential summations. The typical strategy is to consider a class of probability distributions P n (m) parametrized by some set of parameters m, then calculate F V (m, T) and finally find the minimum m � . This solution is temperature dependent and we will refer to it as m � (T). The variational free energy is then where J ij is the interaction between spin i and j, and h i is the external magnetic field. In the mean field approximation, we assume that a variational distribution factorizes. In the case of spin systems This distribution is normalized and the expectation values of the spins are the parameters hS i i = m i . The variational free energy is We have to find its minimum, so its derivative has to be zero and the second derivative, the Hessian should be positive definite We can rewrite the implicit Eq (6) as a self-consistent equation: If m(T) is known for a given temperature T, we can determine it for a slightly lower temperature m(T − ΔT) by using the self-consistent equation iteratively as it was proposed in [26]. This method requires fewer function evaluations than SSA. Above some temperature T > T 0 the iterative equation converges towards a unique high temperature solution m i � h i /T. There are more solutions at lower temperatures. At T 0 , a bifurcation occurs, and the iterations after that follow one of the new post-bifurcation solutions. Later on at successively lower temperatures T 1 > T 2 > T 3 . . . further bifurcations occur, and the iteration follows one of the new solutions. The bifurcation temperatures T n are those points, where one of the eigenvalues and consequently the determinant of the Hessian (7) becomes zero at the solution m i (T n ) of the self-consistent Eq (8). Due to the nature of the solution, the meanfield annealing is also called a bifurcation machine. The procedure described above leads to one of the local minima of free energy. In general, there is no guarantee that the method leads to the absolute minimum, but decreasing the size of cooling steps ΔT and increasing the overall running time in parallel, improves the chances of finding deeper minima [31].

Quantum mean field annealing
Instead of the free energy and temperature, we can use quantum mechanics and the adiabatic theorem to determine the system's ground state. The adiabatic theorem asserts that if a quantum system is initially at the ground state and the corresponding time-dependent Hamiltonian operator changes sufficiently slowly, there is a gap between the ground state energy and the rest of the spectrum. The system remains at the instantaneous ground state [32]. An application of this theorem is to find the ground state of a complicated Hamiltonian. The ground state of the initial Hamiltonian H i should be easy to prepare, and the final operator H f is the one whose ground state we are interested in. In that case, we can compose the Hamilton operator of the adiabatic problem where s(t) is a continuous, monotonic function of time, with s(0) = 0 and s(T A ) = 1. The time T A is the annealing time, which must be large. The simplest choice for this is s(t) = t/T A . To solve the Ising problem with the adiabatic method we have to use the quantum version of the Ising model, as the final Hamilton operator where σ z is the Pauli z-matrix. For the initial Hamilton operator the standard choice is the transverse magnetic field term where Δ is the strength of the external field. This transverse term plays similar role to the entropy term in Eq (2) and it induces tunneling between the minima of the Ising Hamiltonian. The initial ground state is which is a product state, and after the annealing the final state is the ground state of Eq (12) and from that we can determine the minimal energy spin configuration of Eq (3). For a given H(t) the state of the system is governed by the Schrödinger equation.
Initially the system is at ground state: |C(t = 0)i = |C 0 i, If the annealing time T A is sufficiently large, then the evolving wave function reaches the ground state of the Ising model. To develop an MFA to the quantum problem we introduce the energy functional At s = 0, we set the initial wave function to the initial state |C(0)i = |C 0 i and for s > 0, we want to keep it in the minimal energy state. In the mean-field approximation, just like in the thermal case, we assume that the spins are independent and the trial state vector is a product where the amplitudes are constrained by |c i# (s)| 2 + |c i" (s)| 2 = 1. The expectation values of the Pauli operators can be expressed with the amplitudes as where m x i , m y i and m z i are real. They satisfy ðm It is easy to show that c i" and c i# can be chosen to be real, and in that case m y i ðsÞ ¼ 0. In terms of the spin expectation values the energy functional is ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where we used m The first part of Eq (20) is the same as in (6). The difference is in the second term, as it is seen in Fig 1. Now, we can rewrite the implicit Eq (20) as a self-consistent equation where the tunneling rate is Γ = Δ(1 − s)/s and we dropped the superscript m i ¼ m z i . The new sigmoid function is sðxÞ ¼ x ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi This new equation is the main result of the paper. It is the equivalent of the thermal selfconsistent Eq (8), where the tunneling rate Γ is the analog of of the temperature T and σ(x) replaces tanh(x).
The iterative solution of the new equation is similar to the thermal case. Given m i (s = 0) = 0 we would like to determine m i (s = 1). The m(s) vector is minimizing the energy functional E (m(s)), where we assume that the s 7 ! m(s) trajectory is continuous. Without the external magnetic field the Ising model has a Z 2 symmetry. That means the m z i ¼ 0 is a solution to Eq (20), and it is a minimum as long as the smallest eigenvalue of the Hessian is above zero. For the parameter s this means where λ max (J) is the largest eigenvalue of the matrix J ij . We can set Δ to 1 and rescale J ij so that its largest eigenvalue is also 1. Now the trivial solution holds until s reaches 0.5. That means it is sufficient to start the simulation from s = 0.5. Once m i (s = 1) is known in the simulation, we round it to either 1 or -1. This gives us the S spin configuration. Since the derivative of E(m z ) diverges as some m i approaches ±1 it is advantageous to use a different parametrization m i (ϑ i ) = cos(ϑ i ), m x i ðW i Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 À m 2 i p ¼ sinðW i Þ and then the derivative @E/@ϑ i = s(∑ j J ij sin(ϑ i ) cos(ϑ j ) + h i sin(ϑ i )) + (1 − s)(−Δ) cos(ϑ i ) is regular for all ϑ i .

Benchmarking
As it was summarized by Lucas [33], many NP hard problems can be formulated with the Ising model. One of them is the QUBO problem, where we have to find the binary variables x i 2 {0, 1}, minimizing the quadratic functional argmin{q(x)}, where q(x) = ∑ ij Q ij x i x j , and Q is a symmetric matrix. Replacing the binary variables with spin variables x i = (1 + S i )/2 leads to the Ising problem with parameters J ij ¼ À 1 2 Q ij and h i ¼ À 1 2 P j Q ij . The QUBO problem is then equivalent to finding the ground state of this Ising model. The NP-hard Maximum Cut problem [34] is often used as a benchmark for optimization algorithms. The task is to partition an undirected graph G ¼ ðV; EÞ into two subsets (S; V n S) such that the number of edges between these subsets is as large as possible. If the graph is defined via its adjacency matrix, where A ij is one, if i and j are connected and zero otherwise, then the corresponding cut value is During the simulation, the Z 2 symmetry is disadvantageous because, without the external field, the system remains in the m z i ¼ 0 solution forever. Adding some small random external magnetic field, h i breaks this symmetry. We can use this field as noise and run the simulation repeatedly for many random realizations. One such distribution is shown in Fig 2a for the set G3 from G-set [30], which is one of the favorite problems to benchmark the methods for solving the Maximum Cut problem. This is a random graph with 800 nodes and 19176 links. The h i components are randomly generated uniformly from (−A, A), where A is the amplitude, and one hundred trials were generated for all amplitudes. The mean values, the best values, and standard deviations are shown in Fig 2b. If the amplitude is small, we have a high average CV with a small deviation, on the other hand, if the amplitude is large, then the average CV is small, and the deviation is high. An optimal amplitude range exists, where the average is still large enough as well as the deviation, resulting in the largest best value from the sample. In the G3, this is around A = 0.05 − 0.1, where the best-known cat value 11622 has been achieved.
We tested our algorithm with other Max-Cut problems from the G-set. We focused on the smaller ones, i.e., the largest was the G22 graph with 2000 nodes. The results are summarized in Table 1. The last column shows the best values we could find in the literature [9,[35][36][37][38][39]. In most cases, our best results are the same as the best known; in the rest, it is close.

Comparison of the QMFA with SSA and B&B
Two widely used algorithms for discrete optimization problems are the stochastic simulated annealing (SSA), based on the Monte Carlo method, and the Branch-and-Bound method (B&B) [40]. The former is a metaheuristic algorithm, just like the QMFA, but has different hyperparameters. The latter is an exact algorithm where the tests were terminated after a fixed time due to the prior knowledge of its exponential time complexity. These two algorithms are used to compare the performance of the QMFA.
The pseudocode of QMFA is the following: We have used the Newton-conjugate-gradient method from the SciPy library for finding the local minimum. In this QMFA algorithm the noise amplitude A and the number of steps n are the hyperparameters, and once the vector S is known, the Cut Value is   G1  G2  G3  G4  G5  G6  G7  G8  G9  G10  G11  G12  G22 we did in Section 4. After we have determined n and A, we also need the success rate, i.e., the probability of finding the best-known Cut Value. For the G1 problem, out of 1000 trials, 61 were successful, so a rough estimation for the time needed to find the best value if the hyperparameters are known is 0.93 s � (1000/61) = 15.24 s. In the case of SSA we need an annealing schedule [41], which tells us how the temperature decreases. We have chosen the geometrical cooling schedule, where T k+ 1 = T k α, with α 2 (0, 1). The pseudocode of the SSA is algorithm: Stochastic Simulated Annealing input: E S , T init , α, n output: S S :¼ random vector with ±1 elements for k ≔ 0 to n i ≔ random integer from [1, N] Here DE i ¼ E S ðiÞ À E S , and S ðiÞ is the same as S except the ith spin is flipped. If the initial temperature is T init = 2max i ΔE i , then any initial trial i will be accepted with more than 50%, so it can be considered a sufficiently high temperature. The final temperature is T final = T init α n . Since in the G-sets the J ij elements are -1, 0 or 1, the smallest positive energy difference is 1, hence it is reasonable to choose T final = 0.05. Below that temperature it is unlikely to accept any state with higher energy than the current one. The parameter of the geometrical schedule is a ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi . The only remaining hyperparameter is n. Fig 4 shows   use n = 2 × 10 6 , then we need approximately 115 seconds to find the best known Cut Value of G1.
For the B&B method, we used the BiqCrunch implementation [40]. In one hour, the Cut Value it could find was 11524.
The success rate can be dissimilar for the other sets. For example, for G3, after the hyperparameter optimization in the case of the QMFA, it is 0.016, and in the case of SSA with n = 5 × 10 7 , it is 0.17. Even the running time is increased to t QMFA = (2.8 ± 1.8) seconds, resulting in the expected time for finding the best known Cut Value with QMFA to 175 seconds. With SSA, this expected time is 455 seconds, and the B&B method found a solution with the Cut Value of 11538 in one hour.
In the examples mentioned above, the QMFA outperformed both the SSA and the B&B. Still, for example, for the G2 problem, the QMFA was only able to find the best known Cut Value once out of 2000 trials hence it would be inconsistent to calculate the average success time. On the other hand, the SSA found 5 times the best Cut Value out of 100 trials with n = 5 × 10 6 and the expected success time is 1500 seconds. The B&B method found a state with CV = 11521 in one hour.
The G11 is special since the QMFA cannot find the best Cut Value, the SSA can find it on an average of 2500 seconds, but the B&B method needs only 512 seconds. This can be due to its simple structure since the G11 is a toroidal graph. On the other hand, an additional one hour is not enough for B&B to confirm that it is the best possible Cut Value. Since it is a toroidal graph (in this case, an 8 × 100 grid with periodic boundaries), we can solve it exactly with the transfer matrix method, confirming that 564 is indeed the best possible Cut Value.
Considering all of the facts above, we can state that no algorithm is universally dominant, but we can say the QMFA can find a good, in some cases the best, optimum in a relatively short time.

Conclusion and outlook
We have shown that the thermal and the quantum mean-field annealing are governed by similar equations, where the temperature and the tunneling rate play similar roles. This is due to the similarity of the entropic term in the free energy and the transverse term in the energy of the quantum approach. The sigmoid function in the quantum approach differs from the thermal one, and for large arguments, it changes like a power law as opposed to the exponential in the thermal case. As it was pointed out in [29] the specific nonlinearity of the sigmoid function affects the speed of the convergence of the mean-field approach. This can be one of the reasons for the relative success of the present approach in reproducing some of the best cut values for certain problems from the G-set. The approximation presented here is the most straightforward application of the variational method. The next step can be a more sophisticated version of the mean-field annealing that can be introduced by considering the correlations among the most coupled spins, which might lead to even better cut values in the future.